Regulating Reward Training by Means of Certainty Prediction in a Neural Network-Implemented Pong Game
نویسندگان
چکیده
We present the first reinforcement-learning model to self-improve its reward-modulated training implemented through a continuously improving “intuition” neural network. An agent was trained how to play the arcade video game Pong with two reward-based alternatives, one where the paddle was placed randomly during training, and a second where the paddle was simultaneously trained on three additional neural networks such that it could develop a sense of “certainty” as to how probable its own predicted paddle position will be to return the ball. If the agent was less than 95% certain to return the ball, the policy used an intuition neural network to place the paddle. We trained both architectures for an equivalent number of epochs and tested learning performance by letting the trained programs play against a near-perfect opponent. Through this, we found that the reinforcement learning model that uses an intuition neural network for placing the paddle during reward training quickly overtakes the simple architecture in its ability to outplay the near-perfect opponent, additionally outscoring that opponent by an increasingly wide margin after additional epochs of training.
منابع مشابه
ارائه یک شبکـه عصبی MLP به منظور پیشبینی یخبندان در استـان کرمانشـاه
This study, with the help of minimum temperature data, has addressed the prediction of frost during 21 years period by means of neural network in Kermanshah province. In order to forecast frost, data were converted to the values between 0 and 1 by means of a subjective and one to one (injective) function. We have used feed-forward neural network by one hidden interior layer with number of chang...
متن کاملTraffic Signal Prediction Using Elman Neural Network and Particle Swarm Optimization
Prediction of traffic is very crucial for its management. Because of human involvement in the generation of this phenomenon, traffic signal is normally accompanied by noise and high levels of non-stationarity. Therefore, traffic signal prediction as one of the important subjects of study has attracted researchers’ interests. In this study, a combinatorial approach is proposed for traffic signal...
متن کاملRTDGPS Implementation by Online Prediction of GPS Position Components Error Using GA-ANN Model
If both Reference Station (RS) and navigational device in Differential Global Positioning System (DGPS) receive signals from the same satellite, RS Position Components Error (RPCE) can be used to compensate for navigational device error. This research used hybrid method for RPCE prediction which was collected by a low-cost GPS receiver. It is a combination of Genetic Algorithm (GA) computing an...
متن کاملSignal Prediction by Layered Feed - Forward Neural Network (RESEARCH NOTE).
In this paper a nonparametric neural network (NN) technique for prediction of future values of a signal based on its past history is presented. This approach bypasses modeling, identification, and parameter estimation phases that are required by conventional parametric techniques. A multi-layer feed forward NN is employed. It develops an internal model of the signal through a training operation...
متن کاملGlobal Solar Radiation Prediction for Makurdi, Nigeria Using Feed Forward Backward Propagation Neural Network
The optimum design of solar energy systems strongly depends on the accuracy of solar radiation data. However, the availability of accurate solar radiation data is undermined by the high cost of measuring equipment or non-functional ones. This study developed a feed-forward backpropagation artificial neural network model for prediction of global solar radiation in Makurdi, Nigeria (7.7322 N lo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1609.07434 شماره
صفحات -
تاریخ انتشار 2016